Skip to content

Non-record: MDLM Masked Diffusion + Depth Recurrence — val_bpb 1.3428 (8×H100, seed=1337)#1582

Open
He-Wenhao wants to merge 1 commit intoopenai:mainfrom
He-Wenhao:submission/mdlm-depth-recurrence
Open

Non-record: MDLM Masked Diffusion + Depth Recurrence — val_bpb 1.3428 (8×H100, seed=1337)#1582
He-Wenhao wants to merge 1 commit intoopenai:mainfrom
He-Wenhao:submission/mdlm-depth-recurrence

Conversation

@He-Wenhao
Copy link
Copy Markdown

Summary

val_bpb: 1.3428 (int8+zlib roundtrip) | 14.73 MB | 8×H100 SXM, 600s | Beats #1403 by 0.0057 BPB

Extends the MDLM baseline (#1403) with depth recurrence and quantization improvements.

Stack

  • Depth recurrence: physical layers L1–L3 looped 1× extra → 12 effective layers / 9 physical layers
  • QAT (STE): straight-through quantization at lr_scale < 0.40 (~last 480 steps of 8,049 total)
  • EMA (decay=0.997) applied before serialization
  • GPTQ-lite: 5-candidate percentile clip search (99.9%→100%) per row, min-MSE selection
  • Linear LR → 0 (Muon warmdown), relu² MLP, Muon WD=0.01

Results (8×H100 SXM, seed=1337, 600s)

Metric This #1403
Pre-quant val_bpb 1.3379 1.3409
Post-roundtrip val_bpb 1.3428 1.3485
Quant penalty 0.0049 0.0076
Artifact 14.73 MB 15.63 MB
Steps 8,049 11,808
ms/step 74.6 ms 50.8 ms

EMA + GPTQ-lite cuts quant penalty from 0.0076 → 0.0049. Depth recurrence improves pre-quant quality (1.3379 vs 1.3409) even with fewer steps, because ~12 effective layers of compute per forward pass.

Extends PR openai#1403 MDLM baseline with depth recurrence (L1-L3 looped 1x
extra = 12 effective layers), QAT/STE, EMA decay=0.997, GPTQ-lite clip
search, linear LR->0, relu^2 MLP, Muon WD=0.01.

val_bpb: 1.3428 | quant penalty: 0.0049 | artifact: 14.73 MB
8xH100 SXM, 600s, seed=1337

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant